On Parameterized Tiled Loop Generation and Its Parallelization

نویسندگان

  • DaeGon Kim
  • Sanjay V. Rajopadhye
چکیده

Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has proved to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell. Data locality and parallelism will continue to serve as major vehicles for achieving high performance on modern architectures. Parameterized tiling is tiling where the size of blocks is not fixed at compile time but remains a symbolic constant that can be selected/changed even at runtime. Parameterized tiled loops facilitate iterative and runtime optimizations, such as iterative compilation, auto-tuning and dynamic program adaption. Existing solutions to parameterized tiled loop generation are either restricted to perfectly nested loops or difficult to parallelize on distributed memory systems and even on shared memory systems when a program does not have synchronization free parallelism. We present an approach for parameterized tiled loop generation for imperfectly nested loops. We empoly a direct extension of the tiled code generation technique for perfectly nested loops and three simple optimizations on the resulting parameterized tiled loops. The generation as well as the optimizations are achieved purely syntactic processing, hence loop generation time remains negligible. Our code generation technique provides comparable efficiency of generated code to the existing code generation techniques while our generated code remains simple and suitable for parallelization. We also provide a technique for statically restructuring parameterized tiled loops to the wavefront scheduling on shared memory system. Because the formulation of parameterized tiling does not fit into the well established polyhedral framework, such static restructuring has been a great challenge. However, we achieve this limited restructuring through a syntactic processing without any sophisticated machinery.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tools for Performance Optimizations and Tuning of Affine Loop Nests

Multicore processors have become mainstream and the number of cores in a chip will continue to increase every year. Programming these architectures to effectively exploit their very high computation power is a non trivial task. First, an application program needs to be explicitly restructured using a set of code transformation techniques to optimize for specific architectural features, especial...

متن کامل

Computer Science Technical Report Canonic Multi-Projection: Memory Allocation for Distributed Memory Parallelization

The Polyhedral model is now the accepted technology for automatic parallelization of affine control loop programs. It has been successful in automatically generating tiled shared memory parallel programs for shared memory platforms (plus vectorization). We address the challenges arising when we move toward distributed memory parallelization, based on wavefront execution of parameterized tiles. ...

متن کامل

Data Parallel Code Generation for Arbitrarily Tiled Loop Nests

Tiling or supernode transformation is extensively discussed as a loop transformation to efficiently execute nested loops onto distributed memory machines. In addition, a lot of work has been done concerning the selection of a communication-minimal and a scheduling-optimal tiling transformation. However, no complete approach has been presented in terms of implementation for non-rectangularly til...

متن کامل

On Code-Generation in the Polyhedral Model

Automatic parallelization in the polyhedral model is based on aane transformations from an original computation domain (iteration space) to a target space-time domain, often with a diierent transformation for each variable. Code generation is an often ignored step in this process that has a signiicant impact on the quality of the nal code. Previous code generation methods are based on loop spli...

متن کامل

Compiler Parallelization Techniques for Tiled Multicore Processors

Recently, tiled multicore processors have been proposed as a solution to provide both performance and scalability. Unlike conventional multicore processors, tiled microprocessors provide on-chip networks to exploit fine-grained parallelism. However, the performance of tiled microprocessors largely depends on compilers because of their relatively simple hardware; exploitation of parallelism, com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010